Skip to content

feat(review): kb.triage_pending — advisory triage scoring for the pending queue#345

Open
jsdevninja wants to merge 1 commit into
vouchdev:mainfrom
jsdevninja:feat/322-triage-pending
Open

feat(review): kb.triage_pending — advisory triage scoring for the pending queue#345
jsdevninja wants to merge 1 commit into
vouchdev:mainfrom
jsdevninja:feat/322-triage-pending

Conversation

@jsdevninja

@jsdevninja jsdevninja commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

summary

closes #322.

a long kb.list_pending forces a reviewer to reconstruct, per proposal,
whether the claim fits the existing kb, whether its citations resolve,
whether it duplicates something already filed, and whether it contradicts
an approved claim. those signals already exist in scattered form
(find_similar_on_propose, proposals._payload_block_reason) but nothing
surfaces them together as a ranked, explained view.

this adds an optional read-side triage pass over the pending queue:

  • kb.triage_pending(proposal_ids=None) returns each pending proposal's
    model_dump plus a _meta.vouch_triage block: recommendation
    (approve / reject / needs-human, advisory only), score (0.0-1.0),
    signals (fit, citation_quality, duplication_risk,
    contradiction_risk, each with its own score + rationale), and a short
    rationale string.
  • vouch triage [proposal-id...] mirrors it on the cli, with --json and
    --reverse.
  • opt-in: disabled unless triage.enabled: true is set in
    .vouch/config.yaml; per-signal triage.weights are configurable.

review-gate scope

read-only by construction — this is the load-bearing property the north
star in CLAUDE.md calls out. the pass never calls proposals.approve,
proposals.reject, store.put_*, or store.move_proposal_to_decided; a
human still calls kb.approve / kb.reject. recommendation is an
advisory string nothing else consumes.

  • citation_quality reuses proposals._payload_block_reason (the same
    dangling-ref / invalid-payload gate check_approvable uses).
  • duplication_risk reuses the propose-time embedding path
    (embeddings.similarity.find_similar_on_propose) for claims, and
    degrades to a difflib text-similarity heuristic when no embedder is
    registered (base install, no [embeddings] extra) — the block shape
    stays the same either way.
  • fit runs its own lower-threshold embedding search
    (index_db.search_embedding) rather than reusing the near-duplicate-only
    hits from find_similar_on_propose — reusing those directly would let a
    literal duplicate's high "fit" score cancel out its own
    duplication_risk penalty in the composite.
  • contradiction_risk looks for topically-related approved claims that
    share an entity with the proposal but disagree on a simple negation-word
    signal (heuristic, advisory only — same caveats as the other signals).

registered at all four kb.* surface sites: server.py (@mcp.tool()),
jsonl_server.py (_h_triage_pending + HANDLERS), capabilities.py
(METHODS), and cli.py (vouch triage).

test plan

  • tests/test_triage.py (25 tests): output shape, the no-write
    invariant (pending proposals stay pending, nothing approved/rejected/
    created), the disabled-by-default opt-in gate, citation_quality
    forcing reject on a dangling-ref proposal, duplication_risk on both
    the heuristic and embedding backends (--backend heuristic config
    override too), fit's entity-overlap + topical scoring, contradiction_risk's
    polarity-conflict heuristic, proposal_ids filtering, config/weights
    plumbing, and cli / jsonl wiring
  • .venv/bin/python -m pytest tests/ -q --ignore=tests/embeddings — full
    suite green apart from 7 pre-existing Windows-only failures (verified
    identical on a clean main checkout — os.getuid(), symlink
    privilege, and path-separator assertions) and 2 pre-existing hangs in
    test_http_server*.py (also reproduce on a clean checkout, unrelated
    to this change)
  • .venv/bin/python -m mypy src — clean (same 2 pre-existing Windows/
    missing-stub errors on a clean checkout)
  • .venv/bin/python -m ruff check src tests — clean

Summary by CodeRabbit

  • New Features

    • Added advisory triage scoring for pending items, with a score, recommendation, and rationale shown in the app, CLI, and API responses.
    • Added support for viewing triage results in JSON and ranked human-readable output.
    • Triage can be enabled through configuration and includes a fallback when advanced similarity support isn’t available.
  • Tests

    • Expanded automated coverage for triage behavior, output format, configuration, and command/API wiring.

…ding queue

a long `kb.list_pending` forces the reviewer to reconstruct, per proposal,
whether the claim fits the existing kb, whether its citations resolve,
whether it duplicates something already filed, and whether it contradicts
an approved claim. this adds an optional triage pass that scores each
pending proposal on those four signals and attaches a `_meta.vouch_triage`
block (recommendation/score/signals/rationale) to help a reviewer
prioritize, without ever deciding anything itself.

read-only by construction: the pass never calls proposals.approve/reject,
store.put_*, or store.move_proposal_to_decided — a human still calls
kb.approve/kb.reject. citation_quality reuses proposals._payload_block_reason;
duplication_risk reuses the propose-time embedding similarity path
(embeddings.similarity.find_similar_on_propose) and degrades to a difflib
heuristic when the embeddings extra isn't installed. fit uses a separate,
lower-threshold embedding search so a near-duplicate hit doesn't also
inflate fit and cancel out its own duplication penalty.

opt-in via `triage.enabled: true` in config.yaml (default false). registered
at all four kb.* surface sites (server.py, jsonl_server.py, capabilities.py,
cli.py) plus `vouch triage [proposal-id...]` with `--json` and `--reverse`.
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds an advisory kb.triage_pending capability that scores pending proposals on fit, citation quality, duplication risk, and contradiction risk, attaching a _meta.vouch_triage block. It is wired into capabilities, MCP server, JSONL server, and CLI (vouch triage), with tests and a changelog entry.

Changes

kb.triage_pending advisory scoring

Layer / File(s) Summary
Triage config and scoring signals
src/vouch/triage.py
New module implementing TriageConfig, triage_cfg, embedding/heuristic helper logic, four scoring signals (fit, citation_quality, duplication_risk, contradiction_risk), score composition, recommendation/rationale mapping, and score_proposal/triage_pending entrypoints with a TriageError for disabled triage.
Capability, MCP, and JSONL registration
src/vouch/capabilities.py, src/vouch/server.py, src/vouch/jsonl_server.py
Adds "kb.triage_pending" to METHODS, a new kb_triage_pending MCP tool wrapping triage_pending with error normalization, and a JSONL _h_triage_pending handler wired into HANDLERS.
CLI triage command
src/vouch/cli.py
Adds vouch triage command supporting optional proposal IDs, --json, and --reverse, sorting by score and printing ranked JSON or table output with rationale.
Triage test suite
tests/test_triage.py
New end-to-end tests covering enablement, output schema, no-mutation invariant, per-signal behavior (citation quality, duplication, fit, contradiction), embeddings vs heuristic fallback, config plumbing, and JSONL/CLI wiring.
Changelog entry
CHANGELOG.md
Documents the new kb.triage_pending capability, its metadata, read-only behavior, backend fallback, and CLI usage.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Reviewer
  participant CLI as vouch CLI / MCP / JSONL
  participant triage_pending
  participant score_proposal
  participant KBStore

  Reviewer->>CLI: request triage (proposal_ids, --json/--reverse)
  CLI->>triage_pending: triage_pending(store, proposal_ids)
  triage_pending->>KBStore: check triage.enabled config
  alt triage disabled
    triage_pending-->>CLI: raise TriageError
  else triage enabled
    triage_pending->>KBStore: fetch pending proposals
    loop each proposal
      triage_pending->>score_proposal: compute signals & score
      score_proposal-->>triage_pending: score, recommendation, rationale
    end
    triage_pending-->>CLI: annotated proposals (_meta.vouch_triage)
    CLI-->>Reviewer: ranked JSON/table output
  end
Loading

Suggested reviewers: plind-junior

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly names the new kb.triage_pending advisory scoring feature and its review-queue scope.
Linked Issues check ✅ Passed The changes add the read-only triage pass, register it across MCP/JSONL/capabilities/CLI, keep it opt-in, and cover no-write and fallback behavior in tests.
Out of Scope Changes check ✅ Passed All additions are directly tied to triage scoring, wiring, docs, and tests; no unrelated feature work is apparent.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added docs documentation, specs, examples, and repo guidance cli command line interface mcp mcp, jsonl, and http surfaces tests tests and fixtures size: XL 1000 or more changed non-doc lines labels Jul 3, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/vouch/triage.py (1)

451-473: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Redundant embedder fetch and corpus reads per proposal.

Within a single score_proposal call: _safe_embedder() is invoked once inside _embedding_hits_for_claim (line 172, just to check is None) and again at line 459 to obtain the actual embedder used by _signal_fit. Separately, in the heuristic (no-embeddings) path, _signal_duplication_risk (Line 329) and _signal_contradiction_risk (Line 375) each independently call _claim_text_pool, which re-reads store.list_claims() and store.list_proposals(...) from scratch for the same proposal. Across the triage_pending loop (Lines 496-503) this duplicates I/O and embedder instantiation for every pending proposal.

Consider computing the embedder and the claim/proposal pool once per proposal (or once per triage_pending call, if get_embedder() isn't already cached) and threading them into _embedding_hits_for_claim, _signal_fit, _signal_duplication_risk, and _signal_contradiction_risk instead of recomputing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/triage.py` around lines 451 - 473, score_proposal is doing
redundant work by fetching the embedder and rebuilding the claim/proposal corpus
multiple times for the same proposal. Compute the embedder once in
score_proposal and thread it into _embedding_hits_for_claim and _signal_fit
instead of calling _safe_embedder() separately, and precompute the shared claim
text pool/corpus once per proposal (or per triage_pending run) so
_signal_duplication_risk and _signal_contradiction_risk can reuse it rather than
each calling _claim_text_pool again. Update the helper signatures accordingly
and preserve the existing behavior in score_proposal and triage_pending.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/vouch/triage.py`:
- Around line 451-473: score_proposal is doing redundant work by fetching the
embedder and rebuilding the claim/proposal corpus multiple times for the same
proposal. Compute the embedder once in score_proposal and thread it into
_embedding_hits_for_claim and _signal_fit instead of calling _safe_embedder()
separately, and precompute the shared claim text pool/corpus once per proposal
(or per triage_pending run) so _signal_duplication_risk and
_signal_contradiction_risk can reuse it rather than each calling
_claim_text_pool again. Update the helper signatures accordingly and preserve
the existing behavior in score_proposal and triage_pending.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2fc8659e-96d3-47b9-9050-0d89b1204652

📥 Commits

Reviewing files that changed from the base of the PR and between 5f58c69 and a55cac8.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • src/vouch/capabilities.py
  • src/vouch/cli.py
  • src/vouch/jsonl_server.py
  • src/vouch/server.py
  • src/vouch/triage.py
  • tests/test_triage.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli command line interface docs documentation, specs, examples, and repo guidance mcp mcp, jsonl, and http surfaces size: XL 1000 or more changed non-doc lines tests tests and fixtures

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: kb.triage_pending — advisory scoring on pending proposals for reviewers

1 participant